Marginal Asymptotics for the “large P, Small N” Paradigm: with Applications to Microarray Data
نویسندگان
چکیده
The “large p, small n” paradigm arises in microarray studies, where expression levels of thousands of genes are monitored for a small number of subjects. There has been an increasing demand for study of asymptotics for the various statistical models and methodologies using genomic data. In this article, we focus on one-sample and two-sample microarray experiments, where the goal is to identify significantly differentially expressed genes. We establish uniform consistency of certain estimators of marginal distribution functions, sample means and sample medians under the large p small n assumption. We also establish uniform consistency of marginal p-values based on certain asymptotic approximations which permit inference based on false discovery rate techniques. The affects of the normalization process on these results is also investigated. Simulation studies and data analyses are used to assess finite sample performance.
منابع مشابه
Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest
Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...
متن کاملLocal likelihood regression in generalized linear single-index models with applications to microarray data
Searching for an effective dimension reduction space is an important problem in regression, especially for high dimensional data such as microarray data. A major characteristic of microarray data consists in the small number of observations n and a very large number of genes p. This “large p, small n” paradigm makes the discriminant analysis for classification difficult. In order to offset this...
متن کاملA new test for sphericity of the covariance matrix for high dimensional data
AMS subject classifications: 62H10 62H15 Keywords: Covariance matrix Hypothesis testing High-dimensional data analysis a b s t r a c t In this paper we propose a new test procedure for sphericity of the covariance matrix when the dimensionality, p, exceeds that of the sample size, N = n + 1. Under the assumptions that (A) 0 < trΣ the concentration, a new statistic is developed utilizing the rat...
متن کاملThe False Discovery Rate in Simultaneous Fisher and Adjusted Permutation Hypothesis Testing on Microarray Data
Background and Objectives: In recent years, new technologies have led to produce a large amount of data and in the field of biology, microarray technology has also dramatically developed. Meanwhile, the Fisher test is used to compare the control group with two or more experimental groups and also to detect the differentially expressed genes. In this study, the false discovery rate was investiga...
متن کاملInvestigation on metabolism of cisplatin resistant ovarian cancer using a genome scale metabolic model and microarray data
Objective(s): Many cancer cells show significant resistance to drugs that kill drug sensitive cancer cells and non-tumor cells and such resistance might be a consequence of the difference in metabolism. Therefore, studying the metabolism of drug resistant cancer cells and comparison with drug sensitive and normal cell lines is the objective of this research. Material and Methods:Metabolism of c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005